Properties of the Nonparametric Maximum Likelihood ROC Model with a Monotonic 

Likelihood Ratio 



Lucas Tcheuko 

University of Maryland College Park 

Frank Samuelson 

U.S. Food and Drug Administration 
10903 New Hampshire Avenue 

Building 62, Room 3102 
Silver Spring, MD 20993-0002 



Email address: lucastaumd . edu (Lucas Tcheuko) 
Preprint submitted to Journal of Mathematical Psychology 



February 1, 2013 



Properties of the Nonparametric Maximum Likelihood ROC Model with a Monotonic 

Likelihood Ratio 



Lucas Tcheuko 

University of Maryland College Park 

Frank Samuelson 

U.S. Food and Drug Administration 
10903 New Hampshire Avenue 

Building 62, Room 3102 
Silver Spring, MD 20993-0002 



Abstract 

We expect that some observers in perceptual signal detection experiments, such as radiologists, will make rational decisions, and 
therefore ratings from those observers are expected to form a convex ROC curve. However, measured and published curves are 
often not convex. This article examines the convexity-constrained nonparametric maximum likelihood estimator of the ROC curve 
given by Lloyd (2002). Like Lloyd we use the Pool Adjacent Violator Algorithm (PAVA) to construct the estimate of the convex 
curve. We present a direct proof that this estimate is a convex hull of the empirical ROC curve. The estimate is simple to construct 
by hand, and follows the suggestions by Pesce, et al. (2010). 

We examine the properties of this constrained nonparametric maximum likelihood estimator (NPMLE) under a large number 
of experimental conditions. In particular we examine the behavior of the area under the curve which is often used as summary 
metric of diagnostic performance. This constrained ROC estimator gives an area under the curve (AUC) estimate that is biased 
high with respect to the usual empirical AUC estimate, but may be less biased with respect to the underlying continuous true AUC 
value. The constrained ROC estimator has lower variance than the usual empirical one. Unlike previous authors who used complex 
bootstrapping to estimate the variance of the constrained NPMLE we demonstrate that standard unbiased estimators of variance 
work well to estimate the variance of the NPMLE AUC. 

Keywords: constrained estimates, empirical likelihood, PAVA algorithm, convex ROC curves 



1. Introduction 

In signal detection experiments, such as diagnostic medical 
imaging studies, an image is presented to an observer who gives 
an ordered categorical rating or score that indicates his confi- 
dence in the presence of a signal or disease in that image. For 
example, a subject may give a rating of one through five, with 
five indicating the observer is very confident that the image con- 
tains a signal. The probability distribution of ratings given to 
the subjects who have disease we call F, and the distribution of 
ratings given to the subjects who do not have disease we call 
H. Example densities are shown in Figure [T] Typically we as- 
sume that these ratings arise from some continuous underlying 
perceptual distribution and are discretized by the observer for 
convenience. 

From this task we can create measures of diagnostic perfor- 
mance. At each rating category we can calculate the true pos- 
tive rate (TPR), also called sensitivity, and the false positive rate 
(FPR) or 1 -specificity. The plot of TPR versus FPR across all 
ordered categories yields an empirical estimate of the Receiver 
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Operating Characteristic (ROC) curve (Green & Swets 1966). 
An example of such a curve is shown in Figure ]3j~The area 
under an ROC curve (AUC) is the probability that the radiolo- 
gist will give a higher rating to a diseased patient than a patient 
without disease and is used to rate the performance of the diag- 
noses. 

An ROC curve that is not convex implies that a human ob- 
server, such as a radiologist, will give high ratings to some of 
the images with signal or disease, but also give such images 
very low ratings, even lower than the ratings given to com- 
pletely images with no signal. The higher ratings are expected, 
but the lower ratings are irrational. There is no reason in gen- 
eral that a human observer would give the lowest of his ratings 
to images that really contain a signal. Logically, scientists be- 
lieve that in the limit of barely perceptible signals the worst that 
human observers could do is guess, in other words, assign a 
low rating to a signal present or signal absent image with equal 
probability. Theoretically the likelihood ratio of a pair of distri- 
butions, dF/dH, that model human detection responses should 
be a monotonically increasing function of the rating given by 
the observer. Models of ROC data that have monotonic likeli- 
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hood ratios are called "proper. " (|Green & Swets 1966 Dorf- 



man et al.j [T997} |Metz & Pan||1999||Lloyd| 12002) |Pesce et al 



(2010 1 detail why we should expect to see convex ROC curves 



under the assumption that radiologists would perform their task 
at least as accurately as guessing, and they propose a method 
for constructing a convex ROC curve estimate. 

In this paper we examine the nonparametric maximum likeli- 
hood estimate (NPMLE) of the ROC curve under the constraint 
that its likelihood ratio be monotonic that was first developed by 
Lloyd| ( p002l l. Like |Lim & Won| ( |20"l2| we prove that this esti- 
mate is equivalent to a simple convex hull around the usual em- 
pirical or unconstrained nonparametric ROC curve MLE. Un- 



like Lim & Won (2012 1 we prove this directly from the likeli- 



hood and constraints thereon. 

This paper examines the properties of the constrained ROC 
NPMLE under a large number of population parameters. In 
particular we evaluate the properties of the associated AUC es- 
timate, the area under the the constrained ROC NPMLE. We 
demonstrate that this estimate has less variance than an uncon- 
strained estimator, and it may have less bias for discretized data. 

Unlike poyd| ( p002l l and |Lim & Won| ( pOT2| , who used nu- 
merical bootstrap techniques to estimate the variance of pa- 
rameters of the constrained ROC curve MLE, we have found 
that standard estimators of AUC variance may be used to ob- 
tain variance estimates of constrained nonparametric maximum 
likelihood AUC estimate. Over a wide range of simulation con- 
figurations we have found that these estimates are nearly unbi- 
ased and very robust. In summary this constrained estimator 
has a rational convex ROC curve, is simple to construct, it may 
have small bias, its variance is easily estimated, and therefore it 
may be useful in observer performance experiments. 

2. Data 

2.1. Notation 

Our data of interest is typically ordered categorical data col- 
lected from humans observing two classes of objects. 
Let 

x\ , . . . , x?i be an IID random sample of ratings or scores from 
the signal-absent or non-diseased population with density h, let 

yi , . . . ,yM be a random sample from the signal-present or dis- 
eased population with density /. All ratings are assumed to 
be discrete in k ordered categories. Each signal-present score 
y s , s = 1 . . . M corresponds to a category index d s , and each 
signal-absent score x r , r — 1 . . .N corresponds to a category c r . 
There are m, signal observations and «, signal-absent observa- 
tions in the i th category. For simplicity we assumed the scores in 
each signal-absent category i to be all identically equal to x, and 
the scores in each signal-present category j to be all identically 
equal to yj. 



Let N = n i< an d M = ^ m, . 
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Figure 1: This figure shows two discrete density functions of 
ordinal observations from two different classes of objects, im- 
ages with disease and images with no disease. This data is from 
|Chan et aL] ( |2001| l and is also given in Table [2] The empirical 
cumulative distributions of these data sets can be plotted against 
each other to form the ROC curve shown in Figure [3] 



2.2. Example Data Sets 



As examples we have selected data sets from a study by Chan 



et al. (2001). In the study, radiologists reviewed X-ray images 
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of mammographic microcalcifications (lesions) that were digi- 
tized at different resolutions. Every radiologist rated each lesion 
from one to ten at every resolution. Higher ratings indicated a 
radiologist's greater confidence in the malignancy of the lesion. 
Radiologists gave lower ratings to lesions that they believed to 
be benign. The study measured the ability of radiologists to 
differentiate malignant lesions from benign lesions as a func- 
tion of the image resolution as measured by the area under the 
ROC curve (AUC). In the study the authors used a bi-normal 
parametric method as in Dorfman & Alf ( 1969 ) to estimate the 
ROC curve from the ten-category data. 

The distribution of scores from radiologist 5 evaluating the 
lesions at a resolution 140 microns is given in Figure[T] 

Figure 3a shows the empirical ROC curve derived from this 



data. Easily visible in the figure are the 10 solid line segments 
derived from the 10 rating categories. We observe that the em- 
pirical ROC curve is not convex for this given data set. Later 
in Section|7]we use this data and ratings from radiologist 4 at a 
resolution of 35 microns to demonstrate the methods described 
in this paper. 

In this experiment we have confidence that doctors will not 
systematically give lower ratings to images that contain lesions, 
nor will they 

systematically give higher ratings to images with no le- 
sions. Therefore we believe the true underlying population 
ROC curves will be convex, and any nonconvexity in these em- 
pirical ROC curve estimates is merely due to statistical uncer- 
tainty from finite sample size. The average ROC curve across 



all radiologists and experiments in Figure 7 of Chan et al. 



(2001 ) supports this assertion. 



3. Likelihood Function 



The usual likelihood (Owen 2001 1988 1 of obtaining the 
categorized ratings for both signal present and absent observa- 
tions from our experiments given our estimates of F and H is 



(1) 



where p t = dF(y{) = F(y;) - F(y— ); q t = dH(xi) = H(x f ) - 
H(xj-), and y~ is the largest possible value of y less than y,. As 
in the power biased model Qin & Lawless ( 1994 ), Vardi ( 1985), 
and Vardi ( 1982[ l, we set w = p/q. The likelihood function 
becomes 



W; . 



(2) 



We use the Lagrange multipliers as in Anderson ( 1972) to 
maximize the likelihood function. Maximizing Jz? subject to 

* k 

} t qi = 1 and ^ q(w { — 1 is equivalent to minimizing 



log(Jzf) = Yj -fa + '"<) lo gfe) - m i lo lOi) + C (3) 



subject to the same constraints, where C is a constant. 



By first minimizing - log(Jz? ) as a function of q, ■ . . . , q^ we 
obtain q, = Plugging for q t in Equation g and 

ignoring the constant terms we are then led to minimize 

k 

{ = + rij) \og(N + Mwd - mi log(w/) 

i=i 

as a function of W — (w\ . . . , Wk). 



4. Estimates 



4.1. Unconstrained Estimates 

As a function of w\, . . . , w^, € is minimized for vv,- 



m,/M 



We plug Wj back to q { = and pi - qiwi to obtain the 

known empirical likelihood estimates ^, = and pi 



In what follows we will refer to w,- = 



mJM 
ni IN 



AT 

as the uncon- 



strained maximum likelihood ratio (UMLR). We should point 
out that these estimates of g,, p„ and vv, give the usual empirical 



ROC curve (Green & Swets 1966; B amber M975J. The empir- 



ical likelihood ratios w, are the slopes of the segments of the 
lines of the empirical ROC curve, i.e. the slopes of the lines in 



Figure 3 a 



4.2. Constrained Estimates 

Believing that the true ROC curve is convex, we are led to 

minimize t = /fai + «,) log(W + Mw,) - nti log(w,) subject to 

!=1 

the monotonic likelihood ratio constraint W\ < W2 < ■ ■ ■ < wt- 
This is equivalent to requiring that the slopes of the segments 
of the ROC curve be increasing from right to left. Let /?, and qi 
be defined as in Section |4~T| Let Pq = Qq = 1, and P t = 

k—i k-i 
7=1 7=1 

As a function of w\ , . . . , wt, € is a sum of k independent func- 
tions 1 1 , . . . , 4; where i; = (m, + «,) log(/V + Mw,) - m ; log(w ; ) 
is a function with a single minimum vv,-. As shown in Figure|2j 
(i decreases up to vv, and increases afterward. 

Without loss of generality we consider two consecutive func- 
tions £i(w) and ^(w). In Figure|2]we plot i\ and as functions 
of w\ and W2 respectively. The graph in Figure 2a shows the 



UMLR h>i of t\ less than the UMLR w 2 of /J 2 , leading to a con- 
vex ROC curve. On the contrary, in Figure 2b we have w\ > w 2 
leading to a nonconvex ROC. 

Let m>i > vi>2 as in Figure 2b Our goal is to find {w\,w£) 



such that (\(w\) + tiiwi) - min t\(w\) + Ciiwi). In this case, 

we need to change the ordering of W\ and w 2 while keeping 
(\{w\) + (2(^2) as small as possible. 

Let A{w\,t\(w\)) and B(w2, ^2(^2)) be two arbitrary points 
as in Figure|2b]such that w\ < W2- Keeping in mind that w\ and 
vv 2 are the respective global minima of (\ and (2 we obtain the 
following inequalities. 

If w\ < W2 then (\(yv2) + ^2(^2) < ^i(wi) + ^2(^2) 
If vi>i < W2 then l\^W\) + (2(^1) < £\(W\) + (2(^2) 
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If w 2 < w\ < w 2 < Wu i\ decreases while t 2 increases on 
the interval \w\, #2], leading to the following inequalities. 

t 2 (w x ) < ( 2 (w 2 ) => A(wi) + £>(wi) < A(wi) + 4(w 2 ) 
and 

l\(w 2 ) < AOi) => {i(w 2 ) + £ 2 (w 2 ) < £i(wi) + £ 2 (w 2 ). 

The above inequalities show that the function l\ + i 2 is never 
at its minimum for two distinct values of W\ and w 2 . We de- 
duce that (m>i,m>2) satisfies vvi = w 2 . Setting wi = w 2 = w we 
minimize the following expression: 

^i(w) + { 2 (w) = (mi + «i)log(iV + Mw) - mi log(w) + 
(m 2 + n 2 ) log(W + Mw) - m 2 log(w) 
= (m\ + m 2 + «i + n 2 ) \og(N + Mw) - 
(nil + mi) log(w). 

Taking the derivative and setting equal to zero we obtain iv = 
w\ — w 2 — T^-r^rn-- Note that for wi = w 2 we obtain 
l*(w) — i\(w) + l 2 (w) whose graph is similar to that of either 
(\ or i 2 from Figure [2] The above proof and conclusions ap- 
ply to any two adjacent categories in a ROC curve where the 
unconstrained likelihood ratios are not increasing. 

The above demonstrates analytically that the obtained esti- 
mate is the constrained maximum likelihood estimate of the 
non-parametric model. 

Note that the constrained estimators w - w\ — w 2 yield a 
solution that is equivalent to adding the two adjacent categories 
in to a single category with counts hi - m\ +m 2 and h-n\ +n 2 . 
To find all constrained estimates wi in a sequence of tx, we use 
an algorithm similar to Lloyd (2002). 



5. PAVA: Pool Adjacent Violator Algorithm 

The PAVA, a simple iterative algorithm, is a standard non- 
parametric method used to produce point estimates for a func- 
tion known to be monotone ( |Barlow et al.| |1972| |Robertson| 
|etaLl[T988l[Best & Chakravarti| [l990] l . 

It replaces each stretch of monotonicity-violating observa- 
tions with the weighted average of the original function values 
in that stretch. 

In order to apply the theoretical approached described above, 
we developed an R (R Development Core Team 2009 ) version 
of the PAVA algoritm similar to that of |Lloyd| ( |2002| >. As in Sec- 
tion |2j we assume the data has been discretized into k different 
categories. Keeping in mind that each signal-present score y s 
corresponds to a category d s and each signal-absent score x r 
corresponds to a category c, as assumed in Section [2] as the 
PAVA combines categories it also reassigns each data score to a 
new category. 

We sequentially describe the PAVA as follows. 

1. k <— k; for all i, Wj <— Wj - '^;mj <— m ; ; <— n,; for all 
s, and for all r, d s <— d s \ c r <— c r 



2. Stop if ROC curve is convex; meaning, W\ < ... < w~ k . 
Otherwise there exists i in 1 : k — 1 such that w, > w 1+ i . If 
so, continue. 

3. The two categories i and i + 1 are combined into category 
i. fhi <- rhj + m i+ u hi <- n,- + n, + 1 ; vv ; <- 'f^. 

4. After combining the two categories i and i + 1 into cate- 
gory i, the following commands describe how we elimi- 
nate category i + 1, reorganize the remaining categories, 
and reassign the score category indexes . 

For s = 1 . . . M, if d s > i then d s <— d s - 1 ; 

for r — 1 . . . N, if c r > i then c r <— c r — 1 ; 

for j = i + 1 . . . k— 1, hj <— Hj+i', rhj <— fhj + \ ; wj <— ivj+i ; 

k^k-l. 

5. Go to step 2. Proceed until ROC curve is convex. 

Applied to the data, the PAVA creates a new data set with total 
number of categories k which is less than or equal to k. We 
respectively denote the number of signal-absent scores in each 
new i-category, the number of signal-present scores in each new 
i-category, the new signal-present category index, and the new 
signal-absent category index of the new data set by n,, m,-, d s , 
and c r . 



The dashed line in Figure 3b is obtained from applying the 



PAVA to the ROC curve in Figure 3 a 
k 



In Figure 3a we have 
= 10. Let pi, qi, P/ and Q, be defined as in Section |4T2] In 
the first step there is a violation for i = 3, i.e W3 is greater 
than h>4. The two segments S 3 and S 4 are replaced by a single 
segment S \ from (Q 2 , P 2 ) to (Q4, P4) and W3 is set set equal to 
( m w i l' h jr-)- This leads to another violation as the new vv 3 is 
less than w 2 . We then replace S 2 and S 3 by a single segment S' 2 
from (QuPx) to (Q 4 , P A ) and w 2 is set equal to <g^g£. 

In the next step the PAVA encounters another violation be- 
tween the segments S 6 and S 7 ; the segments S 6 and S 7 are re- 
placed by a single segment S' 6 . In the final step, the PAVA 
encounters another violation as the slope of S' 6 is greater than 
the slope of S %; we then replace S '(, and S 8 by a single segment 
and the ROC is convex. 

An intuitive reason for implementing the algorithm can be 
summarized as this: 

Given that observers will not perform worse than guessing 
we assume that any non-convexity in a measured empirical 
ROC curve is due to a statistical variation from insufficient 
counts in a particular rating category. We therefore combine 
adjacent ordinal categories until the number of counts in the 
combined categories are sufficient to generate a plausible con- 
vex ROC curve. 

This PAVA algorithm is equivalent to the algorithm given 



in Section 1.2 of Robertson et al. ( 1988 1. Specifically using 



Robertson's w, g, W, and G notation let w(x,) = 4, g(xj) = 



1/f> W J = £ W > °j = Z Si^iMxi). Then the ROC curve, 

a plot of Pj = (Wj, Gj), j = 0, 1, . . . , k with P Q = (0, 0) is the 
cumulative sum diagram (CSD) for the function g with weight 
w. Adapting Robertson's approach, finding the best convex plot 
from the nonconvex one reduces to finding the greatest convex 
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minorant (GCM) of the CSD in the interval [0,Z,t] for the func- 
tion g with weight z. This method will lead to the same estimate 
obtained in Section l4~2l 

To verify that the above solution really is the maximum like- 
lihood estimate under the constraint of a monotonic likelihood 
ratio, we wrote a software function whose input is a combined 
vector of p t and qi and whose output is the likelihood function 
given in Expression [T] if the ROC curve is convex. If the ROC 
curve is not convex, the output is a value less than zero. We then 



used a numerical optimizer ("optim" from the package R ( |R De- 
velopment Core Team, 2009 )) to maximize the function over all 
inputs pi, qi on many random simulated data sets with random 
initial values. The maximum likelihood function given by the 
optimizer was usually equal to that obtained from the PAVA. 
All other times the optimizer failed to converge or converged to 
a local maximum with likelihood less than that obtained from 
the PAVA. This verified numerically that the PAVA gives the 
constrained maximum likelihood. 



6. Bias and Variance Estimation of AUC 

The Wilcoxon-Mann-Whitney statistic is equivalent to the 
area under the ROC curve (AUC) and is frequently used as a 
summary measure of separability of the observations that con- 
tain a signal from observations that do not, or separability of 
images that have disease from those who do not. The area un- 
der the population ROC curve is equal to the probability that a 
rating of an image with a signal is higher than a signal-absent 
rating, AUC= P(Y > X) (|Bamber| [1975]). The area under the 



unconstrained empirical maximum-likelihood ROC curve esti- 
mate is 

1 



AUC = 



NM 



N M 

EE'" 

r=l .s=l 



(4) 



We assume that each pair (r,s) corresponds to a pair (c r , d s ) such 
that x r belongs to the category c r and y s belongs to the category 
d s . 

I \ \fc r -d s 

Irs = \ if C r >d s (5) 

{ 1 \fc r <d s 

The area under the constrained empirical maximum- 
likelihood ROC curve estimate has the same form and is 



AUC = 



1 

NM 



N M 



where 



I\ ifc r - d s 
if C r >d s 
1 if c r < d s 



(6) 



(7) 



In this Section we use simulated data to measure how the 
constrained AUC estimate differs from the unconstrained AUC 
estimate and a true continuous AUC population value. We also 
examine ways to estimate variance of the constrained AUC es- 
timate. 



6.1. Simulation Parameters 

We simulated data from normal distributions, D=N(p., 1), 
and from uniform distributions, D=1/(0, m), which are uniform 
in probability between and m. For the uniform distributions 
H, the non-diseased or signal-absent sample is drawn from 
1/(0, 1), and the signal-present or diseased sample is drawn 
from 1/(0, m) where m > l. Both samples together form a 
combined sample. In order to have the desired AUC value for 
the uniform distribution, it follows from AUC - P(Y > X) that 
AUC — l — j-jjjj from which we derive m = 2 {i-auc) • Tins P a ' r 
of uniform distributions yields the most asymmetric ROC curve 
that still has a monotonically increasing likelihood ratio. 

For the normal distribution N, the diseased sample is drawn 
from Nip., I) and the non-diseased sample is drawn from 
N(0, l). To achieve the desired AUC, we used P(Y > X) = 



where O is the cumulative normal distribution 



function, p. Y = and p x = 0. We set p - V2<I> \AUC). 
This model gives a continous symmetric convex ROC curve. 

6.2. Bias Study 

We first considered a vector of AUC values with elements 
ranging from .6 to .99 with step equal to .01. For each AUC 
value from the above vector, we sampled data from either nor- 
mal distributions, D-N, or uniform distributions, D=1/. We 
simulated 10,000 independent random combined samples from 
D. Each combined random sample consisted of a sample of size 
55 drawn from the non-diseased distribution and a sample of 
size 110 drawn from the non-diseased population. These sam- 
ple size are typical in many studies in medical literature. We 
then discretize each sample into seven categories. For each 
discretized combined sample we computed the usual uncon- 
strained AUC estimate, then applied the PAVA algorithm and 
computed the constrained AUC estimate. 

In Figure [4] we plotted the average constrained and uncon- 
strained AUC estimates from the normally distributed simula- 
tion versus the true AUC. We first note that the average uncon- 
strained AUC estimate is always biased low with respect to the 
true undiscretized population AUC. This reduction of informa- 
tion and AUC estimates due to discretization is well known. We 
also note that the average constrained AUC estimate is always 
greater than the average unconstrained AUC estimate. This re- 
sult is consistent with the construction of the constrained ROC 
curve in Section 14.21 To construct the constrained ROC esti- 
mate we combine categories in a way that always increases the 
area under the curve. With respect to the true population AUC, 
the average constrained AUC estimate is sometimes less, some- 
times greater, depending on the value of AUC. Depending on 
AUC, sample size, and discreteness of data, the constrained es- 
timate may be less biased than the unconstrained estimate. 

6.3. Variance Estimation of the AUC 

In this Section we demonstrate empirically that a standard 
variance estimator of the empirical area under the curve (AUC) 
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w 2 = w 2 
Likelihood Ratio 



w 2 



W W! 

Likelihood Ratio 



(a) Likelihoods of a Convex Estimate 



(b) Likelihoods of a Non-convex Estimate 



Figure 2: These are graphs of the negative log likelihood components I as a function of the likelihood ratios w.The unconstrained 

he unconstrained 
W\ > m>2 meaning the 



minimum of £j is m>,. In Figure 2a the unconstrained minimum w\ of the - log likelihood function l\ is less than the unconstrained 
minimum >V2 of Z<i ; therefore, the likelihood function I = l\ + €2 yields a convex ROC. In Figure 2b 
corresponding -log likelihood ratio yields a non-convex ROC curve. 



Figure 4: This figure shows the fractional bias (the mean esti- 
mated value divided by the true value) of the two nonparamet- 
ric estimators of AUC in this paper as a function of AUC. The 
mean of the usual unconstrained nonparametric AUC estimate 
(equation |4| is always less than the true underlying simulation 
AUC value because the observed values are discretized. The 
constrained AUC estimate (equation [6]i is always greater than 
the unconstrained AUC estimate. The constrained AUC esti- 
mate can be considered less biased than the usual unconstrained 
emprirical AUC estimate for most values of the AUC. 
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model is I rs 
equivalently for AUC, A = 



also provides a good estimate the AUC of the above convexity- 
restricted ROC curve estimate. Our method of variance esti- 
mation can be considered as a two-way analysis of variance 
(ANOVA) with one observation per cell as in Chapter 7 of 
Sheffe (2009 ). We model I rs where each cell consists of 1 if 
the diseased score s is greater than the non-diseased score r, of 
if both scores are equal, and otherwise. See equation[5 The 
yU + a,- + b s + e rs for r = l...N;s = 1 ... M, or 

N M 

Wm X + ° r + bs + Where 

r=l s=i 

fj. represents the overall mean AUC. The random variable a r has 
mean zero and variance cr a reflecting the variability between the 
disease tests. b s has mean and variance cr/, reflecting the vari- 
ability between the non-disease tests. s rs has mean zero and 
variance <r e representing the interaction between disease and 
non-disease tests. The total variance of A is 



Var(A)=^ + ^ + ^L 
N M NM 



(8) 



The respective estimates & a , fr b , and & E of cr a , cr h , and cr e are 
computed as in equation 7.4.10 of Sheffe (2009). For simula- 
tion purposes, we chose the distribution D=U or D=N as in 



Section 6.2 and then chose the parameters of the distribution 
so that AUC=.6, AUC= .84, and AUC=.94 which respectively 
correspond to m = 1.25, m = 3.125, and m = 25/3 for D=1/ 
and to n = .358, fi = 1.406, and n = 2.199 for D-N. There 
are often more non-disease patients available than disease pa- 
tients. With that in mind, we assumed in the simulation the 
ratio of non-disease sample size to diseased sample size to take 
values r a = l,r = 3, and r a = 9. This means for each com- 
bined sample set consists of M observations treated as diseased 
and N - r a x M treated as non-diseased. We discretized each 
sample into k categories, where k takes values k=3, k=7, and 
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1 -Specificity 

(a) NPMLE ROC Curve 




1 -Specificity 

(b) Constrained NPMLE ROC Curve 



Figure 3: In Figure 3a we plotted the ROC curve of the data from Chan et al. ( 2001 1 described in Section 2.2 and given in Table |] 



The ROC curve is the set all possible measurable specificity and sensitivity pairs. We obtained the constrained estimate (dotted line) 



in Figure 3b by applying the PAVA to the data used in Figure 3a In Figure 3a we have k — 10 line segments. P, and Q, are defined 
in Section 4.2 Segments S2, S3, and S4 form a violation because the slope W4 of segment S4 is so low. Therefore we replace those 
3 segments with a single segment S' 2 with slope W2 = {ni2 + mj, + m\)N l(n.2 + + n\)M Also see Table [2] Categories Se, St, and 
S 8 also form a violation and are replaced with a single category. 
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k=16. For each quadruple (D, M, r a ,k), we generated 10,000 
independent sets of combined random samples from the chosen 
distribution. 



For each quadruple (D,M,r a ,k), we computed the uncon- 
strained AUC A J U and the estimate of variance V ] u using Ex- 
pression [8] We then applied the PAVA algorithm, computed 
the constrained AUC A J C , computed the estimate of the con- 
strained AUC variance V ] c using Expression [i] We then com- 
puted an approximate confidence interval of the AUC, 



A 1 

l-a 



A J C - Za/2 yjv{,A J c + Za/2 y]V J c , for a = .05, where z a is the 
standard normal deviate with probability a. 

For each combined sample set of 10 4 simulations, we plotted 
the square root of the mean of all the unconstrained variance 
estimates V ] u versus the standard deviation across all the simu- 
lated unconstrained AUC A J U in Figure [5] These data points lie 
on the diagonal of the plot indicating that our estimate of the 
variance of the empirical AUC estimate is unbiased (Ga llas &| 
|eTaLl|2009l l- 

Also in Figure [5] we plotted the square root of the mean of 
the constrained variance estimates V ] c versus the standard de- 
viation across all the simulated constrained AUC A c , We note 
from these plots that the true standard deviation of AUC of the 
constrained ROC curves are lower on average than the standard 
deviation of the AUC of the unconstrained ROC curves, and the 
mean estimate of the standard deviation is reduced by almost 
exactly the same amount. 

From the figure it is apparent that our standard estimator of 
empirical AUC variance also very well estimates the variance 
of the constrained empirical estimate of AUC, and it is robust 
against the large range of parameters over which we varied the 
simulation. 

To test this assertion we report in Table[T]the coverage prob- 
abilities, which are the fractions of the simulations falling out- 
side the above estimated confidence interval J°^_ a . From Table^l] 
we observe that the overall performance of the estimate of vari- 
ance is accurate except in a few cases where the following con- 
ditions exist simultaneously: a large AUC, small sample size 
and a large number of categories. 

Note that there is nothing special about the AUC variance 
estimator that we used. Other variance estimates (Bamber 
|1975| |CampbelT] |1994) l also make useful measures the vari- 
ance of the constrained AUC estimate. These simple analytic 
variance estimators agree well with bootstrap estimation tech- 



niques (Efron & Tibshirani 1993 Lloyd 2002 Lim & Won 



2012 1. This includes the simple AUC variance approxim ation 
AUC(1 - AUCXAT 1 + M _1 )/4 ( |Wagner & etaL[|l997} . This 
form makes clear the binomial-like behavior of AUC and the 
expected reduction in variance corresponding to the increase in 
AUC. 



Figure 5: This figure plots the square root of the mean of the 
variance estimates of AUC as a function of the standard devi- 
ations of the AUC estimates of our simulations. Data for both 
the usual unconstrained empirical AUC estimate and the con- 
strained AUC estimate are displayed. Each point represents 
10 4 simulated samples from one of our simulation configura- 
tions. The true standard deviation of AUC of the constrained 
ROC curves are lower on average than the standard deviation 
of the AUC of the unconstrained ROC curves, and the mean 
estimate of standard deviation is reduced by almost exactly the 
same amount. 
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7. Application 

In Section [5] we showed how to obtain the constrained ROC 



curve estimate using data from |Chan et al.| ( [2001| l. Figure 3a 
shows the application of the PAVA to the ROC curve of radi- 



ologist 5 described in Section 2.2 Ten ordinal categories were 
reduced to 6 categories to construct a convex ROC curve. In 
this section we use the same data to compute the constrained 
and unconstrained estimates of AUC as well as the estimates 
of variance of those AUC values. All these estimates are given 
in Table [3] Also given are the results for radiologist 4 reading 
images at a different resolution. 



Unconstrained 
AUC Variance 



Constrained 
AUC Variance 



Radiologist 4 
Radiologist 5 



.7386 
.6622 



.002605 
.002886 



.7618 
.68472 



.002471 
.002782 



Table 3: This table gives the constrained and unconstrained es- 
timates of AUC and the variance AUC given given the ratings 
of radiologist 5 evaluating lesions at a resolution of 140 mi- 
crons. The same estimates are given for radiologist number 4 
evaluating the lesions at a resolution of 35 microns. The esti- 
mates of AUC were calculated using Expressions]?] and [6] and 
the variance was estimated using Expression [8] 

As expected the constrained AUC estimates are greater than 
the unconstrained AUC estimates. Correspondingly the vari- 
ance estimates of the constrained AUC estimates are smaller 
than the unconstrained variance estimates. 



8. Summary 

In some studies of observer signal-detection performance we 
may assume that decisions made by human observers are ratio- 
nal, implying the convexity of a population ROC curve. In that 
scenario we may want to analyze the data under that assump- 
tion. 

In this paper we examined one such method, the constrained 
nonparametric maximum likelihood ROC curve estimate given 
by Lloyd ( 2002| l. We proved directly that this this estimate is 
a simple convex hull around the usual empirical ROC curve. 
We examined the properties of the area under this estimate 
and found that it has lower variance than the usual empirical 
AUC estimate, and it may also have lower bias with respect to 
a continous nondiscretized ROC curve. Our simulations indi- 
cate that standard estimators of variance of the Mann-Whitney- 
Wilcoxon statistic (AUC) variance may be used to obtain nearly 
unbiased and very robust variance estimates of constrained non- 
parametric maximum likelihood AUC estimate. Bootstrapping 
of the convexity algorithm (PAVA) is not required. 

The constrained ROC curve estimate is simple to construct in 
a manner similar to that suggested in |Pesce et al. ( 2010| l. First 
a regular empirical ROC curve is constructed. Then the low- 
est convex envelope that will enclose this ROC curve is drawn. 
This envelope is the new constrained nonparametric maximum 



likelihood ROC curve estimate. Holding the total number of ob- 
servations constant, we recalculate the number of observations 
in each category implied by the constrained estimate. This is 
equivalent to combining adjacent categories that do not have in- 
creasing likelihood ratios, as in Table|2] Next we use these new 
combined observation numbers to calculate the empirical AUC 
estimate in the usual manner. This is the constrained NPMLE 
of AUC. We also can use those new combined observations in 
standard nonparametric estimators of AUC variance to estimate 
the variance of the constrained NPMLE of AUC. This estima- 
tor is a useful option for nonparametric analysis of data from 
studies of observer signal-detection performance where the ob- 
servers are assumed to generate convex ROC curves rationally. 
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Table 1: Coverage Intervals with Nominal Level 0.95 



UNIFORM DISTRIBUTION 



AUC 




3 categories 


7 categories 


16 categories 


ratio r a 




1 


3 


9 


1 


3 


9 


1 


3 


9 




M=55 


.0526 


.0529 


.0554 


.0710 


.0671 


.0577 


.0632 


.0623 


.0587 


.6 


M=100 


.0493 


.0509 


.0536 


.0676 


.0625 


.0583 


.0577 


.0542 


.0578 




M=200 


.0627 


.0572 


.0562 


.0593 


.0547 


.0558 


.0534 


.0508 


.0553 




M=55 


.0639 


.0644 


.0657 


.0665 


.0697 


.0681 


.0754 


.0736 


.0796 


.84 


M=100 


.0597 


.0540 


.0581 


.0564 


.0597 


.0612 


.0644 


.0625 


.0605 




M=200 


.0493 


.0537 


.0574 


.0566 


.0569 


.0595 


.0552 


.0574 


.0580 




M=55 


.0678 


.0717 


.0696 


.0967 


.1006 


.1048 


.1056 


.1158 


.1206 


.94 


M=100 


.0623 


.0659 


.0649 


.0762 


.0808 


.0872 


.0902 


.0901 


.0954 




M=200 


.0559 


.0569 


.0557 


.0650 


.0636 


.0639 


.0718 


.0672 


.0684 


NORMAL DISTRIBUTION 




M=55 


.0519 


.0527 


.0564 


.0628 


.0576 


.0575 


.0597 


.0554 


.0574 


.6 


M=100 


.0522 


.0568 


.0523 


.0550 


.0528 


.0497 


.0531 


.0483 


.0490 




N=200 


.0489 


.0519 


.0502 


.0532 


.0507 


.0519 


.0520 


.0474 


.0468 




M=55 


.0586 


.0585 


.0627 


.0613 


.0598 


.0642 


.0722 


.0643 


.0663 


.84 


M=100 


.0514 


.0603 


.0560 


.0596 


.0571 


.0585 


.0602 


.0564 


.0522 




M=200 


.0548 


.0548 


.0513 


.0557 


.0509 


.0569 


.0610 


.0564 


.0537 




M=55 


.0725 


.0780 


.0830 


.0862 


.0764 


.0765 


.0955 


.0846 


.0891 


.94 


M=100 


.0610 


.0648 


.0702 


.0729 


.0623 


.0697 


.0720 


.0716 


.0743 




M=200 


.0592 


.0566 


.0623 


.0593 


.0611 


.0645 


.0624 


.0569 


.0614 



We generated 10,000 independent sets of combined random samples. Each combined sample consists of a sample size of N treated 
as diseased and a sample size of N — r a x M treated as non-diseased. We discretized each sample into k categories, where k 
takes values k=3, k=7, and k=16. The coverage probabilities reported in the above table are the fractions of the simulations falling 
outside the confidence interval I^ J _ described in Section 6 Here a = 0.05. 



Table 2: Distribution of Score Ratings 



Ordered Rating Category 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


Diseased Counts («,-) 


10 


8 


4 





4 


3 


4 


2 


4 


1 


Nondiseased Counts (m,) 


33 


17 


6 


6 


5 


1 





3 


1 





Likelihood Ratio (vv,) 


0.30 


0.47 


0.67 


0.00 


0.80 


3.00 


oo 


0.67 


4.00 


oo 


Constrained Diseased Counts («,) 


10 




12 




4 




9 




4 


1 


Constrained Nondiseased Counts (m,) 


33 




29 




5 




4 




1 





Likelihood Ratio (VP;) 


0.30 




0.41 




0.80 




2.25 




4.00 


oo 



This table gives the distribution of scores given by radiologist 5 for diseased and nondiseased patients. The lower rows show how 
adjacent ordered rating categories are combined to obtain a monotonically increasing likelihood ratio. This combined data is then 
used to construct an ROC curve and estimate AUC and its variance in the same manner as the original data. 
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